Prediction and predictability of global epidemics: 
the role of the airline transportation network 

Vittoria Colizza^, Alain Barrat^, Marc Barthelemy^, and Alessandro Vespignani^ 

February 7, 2008 



^ School of Informatics and Biocomplexity Center, Indiana University, Bloomington, 47406, IN, USA 
2 UMR du CNRS 8627 ,LPT Batiment 210, Universite de Paris-Sud, 91405 ORSAY Cedex - France 

Abstract 

The systematic study of large-scale networks has unveiled the ubiquitous presence of 
connectivity patterns characterized by large scale heterogeneities and unbounded statisti- 
cal fluctuations. These features affect dramatically the behavior of the diffusion processes 
occurring on networks, determining the ensuing statistical properties of their evolution pat- 
tern and dynamics. In this paper, we investigate the role of the large scale properties of 
the airline transportation network in determining the global evolution of emerging disease. 
We present a stochastic computational framework for the forecast of global epidemics that 
considers the complete world-wide air travel infrastructure complemented with census pop- 
ulation data. We address two basic issues in global epidemic modeling: i) We study the role 
of the large scale properties of the airline transportation network in determining the global 
diffusion pattern of emerging diseases; ii) We evaluate the reliability of forecasts and out- 
break scenarios with respect to the intrinsic stochasticity of disease transmission and traffic 
flows. In order to address these issues we define a set of novel quantitative measures able 
to characterize the level of heterogeneity and predictability of the epidemic pattern. These 
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measures may be used for the analysis of containment policies and epidemic risk assess- 
ment. 

The mathematical modeling of epidemics has often dealt with the predictions and pre- 
dictability of outbreaks in real populations with complicated social and spatial structures and 
with heterogeneous patterns in the contact network ||Tl|2l|3l|4l|5l|6l|71[8l|. All these factors have 
led to sophisticate modeling approaches including disease realism, meta-population grouping, 
stochasticity and more recently to agent based numerical simulations that recreate entire pop- 
ulations and their dynamics at the scale of the single individual In many instances 
however the introduction of the inherent complex features and emerging properties ifTTlfTllfTSll 
of the network in which epidemics occur implies the breakdown of standard homogeneous ap- 
proaches lIH m and calls for a systematic investigation of the impact of the detailed system's 
characteristics in the evolution of the epidemic outbreak. These considerations are particu- 
larly relevant in the study of the geographical spread of epidemics where the various long- 
range heterogeneous connections typical of modern transportation networks naturally give rise 
to a very complicated evolution of epidemics characterized by heterogeneous and seemingly 
erratic outbreaks lfT4l fTSll . as recently documented in the SARS case lfT6ll . In this context, 
air-transportation represents a major channel of epidemic propagation, as pointed out in the 
modeling approach to global epidemic diffusion of Rvachev and Longini ifTTI capitalizing on 
previous studies on the russian airline network ifTSl . Similar modeling approaches, even if lim- 
ited by a very partial knowledge of the world-wide transportation network, have been used to 
study specific outbreaks such as pandemic influenza ifT^EUlETIl . HIV G^ . and very recently 
SARS |23|. The availability of the complete world-wide airport network (WAN) dataset and 
the recent extensive studies of its topology ll24l l25ll are finally allowing a full scale computa- 
tional study of global epidemics. In the following we will consider for the first time a global 
stochastic epidemic model including the full International Air Transport Association (lATA) 
ll26ll database, aiming at a detailed study of the interplay among the network structure and the 
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stochastic features of the infection dynamics in defining the global spreading of epidemics. In 
particular, while previous studies have generally been focused in the a-posteriori analysis of 
real case studies of global epidemics, the large scale modeling presented here allows us to ad- 
dress more basic theoretical issues such as the statistical properties of the epidemic pattern and 
the effect on it of the complex architecture of the underlying transportation network. Finally 
such a detailed level of description allows for the first time the quantitative assessment of the 
reliability of the obtained forecast with respect to the stochastic nature of the disease transmis- 
sion and travel flows, the outbreak initial conditions and the network structure. 

The air transportation network 

In the following we use the International Air Transport Association (lATA) ll26l database 
containing the world list of airport pairs connected by direct flights and the number of available 
seats on any given connection for the year 2002. The resulting world-wide air-transportation 
network {WAN) is therefore a weighted graph comprising V = 3880 vertices denoting airports 
and E = 18810 weighted edges whose weight Wji accounts for the passenger flow between 
the airports j and i. This dataset has been complemented by the population Nj of the large 
metropolitan area served by the airport as obtained by different sources. The final network 
dataset contains the 3100 largest airports, 17182 edges (accounting for 99% of the worldwide 
traffic) and the respective urban population data. The obtained network is highly heteroge- 
neous both in the connectivity pattern and the traffic capacities (see Fig. [T]). The probability 
distributions that an airport j has kj connections (degree) to other airports and handles a num- 
ber Tj = J2e °f passengers (traffic) exhibit heavy-tails and very large statistical fluctua- 
tions |E51 UM . Analogously, the probability that a connection has a traffic w is skewed and 
heavy-tailed. Finally, we associate to each airport a city whose population is heavy-tailed 
distributed in agreement with the general result of Zipf's law for the city size ll27l . More strik- 
ingly, these quantities appear to have non-linear associations among them. This is clearly shown 
by the behavior relating the traffic handled by each airport T with the corresponding number of 
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connections k that follows the non-linear form T with /? ~ 1.5 [24|. Analogously, the city 
population and the traffic handled by the corresponding airport follows the non-linear relation 
iV ~ with a ^ 0.5 in contrast with the linear behavior assumed in previous analysis ll23l . 
The presence of broad statistical distributions and non-linear relations among the various quan- 
tities indicate a possible major impact in the ensuing disease spreading pattern. 

Modeling global epidemics 

As a basic element of our modeling approach we assume the basic standard compartmental- 
ization in which each individual can only exist in one of the discrete states such as susceptible 
(S), latent (L), infected (I), permanently recovered (R), etc. Li each city j the population is Nj 
and xj™' (t) is the number of individuals in the class [m] at time t. By definition it follows that 
Nj = xj™' (t). In each city j the individuals are allowed to travel from one city to another 
by means of the airline transportation network and to change compartment because of the in- 
fection dynamics in each city, similarly to the models in refs. ifTTl EUl and the stochastic 
generalization of ref. lE^ . 

Transport operator. The dynamics of individuals due to travels between cities is described 
by the transport operator fij ({Xt™] }) representing the net balance of individuals in a given class 
X'™] that entered and left each city j. This operator is a function of the traffic flows Wji per unit 
time, the city populations Nj, and might also include transit passengers on connecting flights. 
The number of passengers of each category traveling from a city j to a city (. is an integer random 
variable, in that each of the x]™' potential travellers has a probability pji = WjeAt/Nj to go 
from j to £ in the time interval At. In each city j the numbers of passengers ^je traveling on each 
connection j ^ i at time t define a set of stochastic variables which follow the multinomial 
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distribution 



where (^Xj^^ — j identifies the number of non traveling individuals, and we use standard 

numerical subroutines to generate random numbers of travellers following these distributions. 
The transport operator in each city j is therefore written as 

= Y^^ieAx^r^) - 1)), (2) 

where the mean and variance of the stochastic variables are = pj^xj™' and 

Var(^j£(x]™')) = Pji{l — pje)X^"^K In addition, since the traffic flows are expressed as the 
number of available seats on a given connection, we have to consider that the transport operator 
is in general affected by fluctuations due to an occupancy rate of the airplanes not equal to 1 . 
This introduces a further source of noise since we have to consider that on each connection 
(j, i) the flux of passengers at each time t is given by a stochastic variable 

Wje = Wjt[a + //(I - a)] (3) 

where a = 0.7 corresponds to the average occupancy rate of 70% provided by official statistics 
and ?7 is a random number drawn uniformly in the interval [—1,1] at each time step. 



Infection dynamics. The dynamics of the individuals X^™] between the different compart- 
ments depends on the specific disease considered. In compartmental models there are two 
possible elementary processes ruling the disease dynamics. The first class of process refers to 
the spontaneous transition of one individual from one compartment [m] to another compartment 
[h]. Processes of this kind are the spontaneous recovery of infected individuals (/ R) or the 
passage from a latent condition to an infectious one (L I) after the incubation period. In this 
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case the variation in the number of individuals X'™' is simply given by J2h ^h'o.hX^'^\ where 
ah is the rate of transition from the class [h] and u]^ E { — 1,0,1} is the change in the number 
of X'™! due to the spontaneous process from or to the compartment [h]. The second class of 
processes refers to binary interaction among individuals such as the contagion of one suscepti- 
ble in interaction with an infectious {S + I ^ 21). In the homogeneous assumption, the rate 
of variation of individuals X^™] is given by J2h g ^h'g'^h,gN~^ X^^^ X^^^ , where a^^g is the rate of 
transition rate of the process and z/™^ G { — 1,0,1} the change in the number of X^™] due to the 
interaction. The factor where X is the number of individuals, stems from the fact that the 
above expression considers the homogeneous approximation in which the probability for each 
individual of class [h] to interact with an individual of class [g] is simply proportional to the 
density X^^^N of such individuals (note that it is however possible to consider other cases HI). 

Stochastic formulation of the global spreading model. In order to go beyond the usual de- 
terministic approximations, in each city we work directly with the master equations for the 
processes described above lE^ and under the assumption of large populations we obtain the 
Langevin equations in which we associate to each reaction process a noise term with ampli- 
tude proportional to the square root of the reaction term ll28l l29l l30il . The epidemic Langevin 
equations are coupled among them by the stochastic transport operator that describes move- 
ments of individuals from one city to another and can be numerically solved by considering the 
discretized evolution equations for small time steps At that read 



h 

where rjh^g and rfh are statistically independent Gaussian random variables with zero mean and 
unit variance and Vtj{{X}) is the stochastic travel operator (defined in the previous paragraph) 



Xf 1 [t + At) - Xf 1 it) = J2 ^J^<g<^KgXf mf (t) At+ 



+ v^ahXf{t)^t + ^ vlg^jah^gXJ^xfxf^t VH,g+ 



h h,g 




(4) 
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depending on the traveling probabilities (obtained from the lATA dataset) pji = WjiAt/Nj. 
The model is thus a compartmental system of differential equations which can be numerically 
integrated. It is worth mentioning, however, that the standard integration of these equations by 
using Cauchy-Euler methods leads to a well-known technical problem and specific techniques 
must be used to avoid an asymmetric truncation of the noise terms lOTl . 

The SIR dynamics. 

Global epidemic forecast would be extremely relevant in the case of the emergence of a 
new pandemic influenza that in general spreads rapidly with substantial transmission occurring 
before the onset of case-defining symptoms. In the following we adopt a minimal model for a 
pandemic spread to provide a general discussion that is not hindered by the use of very com- 
plicate disease transmission mechanisms. Specific characteristics such as latency, incubation 
and seasonal effects of the disease can be however easily implemented in the present frame- 
work ifTTI . The analysis of two case studies and the comparison of the forecasts obtained with 
the present approach and the real data will be presented elsewhere. Here, we use the very 
simplistic approximation of the susceptible-infected-removed (SIR) dynamics in which a fully 
mixed population is assumed within each city. In each city j the population A^^ is given by 
Nj = Sj{t) + Ij(t) + Rj{t), where Sj{t), Ij{t) and Rj{t) represent the number of susceptible, 
infected and recovered individuals at time t, respectively. The epidemic evolution is governed 
by the basic dynamical evolution of the SIR model where the probability that a susceptible 
individual acquires the infection from any given infected individual in the time interval dt is 
proportional to pdt, where P is the transmission parameter that captures the aetiology of the 
infection process. At the same time, infected individuals recover with a probability fxdt, where 
fi~^ is the average duration of the infection. By considering the three compartments S, I and R 
in Eq.© and plugging in aj^s = f3, ai = = ^ and the corresponding parameters vj g = 1 
and uf = —uj = 1 it is possible to obtain explicitly the 3100 x 3 differential equations whose 
integration provides the disease evolution in every urban areas corresponding to an airport. Re- 
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suits shown in the following are obtained with the global spreading model based on an SIR 
dynamics within each city. 

At first instance, it is possible to monitor standard epidemiological quantities such as the 
level of infected individuals, the morbidity and the prevalence at different granularity levels; 
i.e. country, state or administrative regions. In Fig. |21we show the dynamical evolution in 
the US of an epidemic starting in Hong-Kong. The evolution of epidemic outbreaks is mon- 
itored by recording at each time step (1 day) the density of individuals in each class (S, I, 
R) present in each city. The parameters /5 and fi are chosen according to ETl in order to use 
biologically sound values and kept constant during the evolution (different values do not lead 
to different overall conclusions). This amounts to assume that no restrictions on traveling or 
targeted prophylaxis measures are implemented during the outbreak. We group the states in the 
nine influenza surveillance regions which are identical to the nine divisions of the US census 
and we use two different visualization strategies. In the first set of maps, regions are drawn 
with their normal size and a color code gives the prevalence of the infection in each region, i.e. 
the fraction of infected individuals. This representation readily shows the high heterogeneity 
of the pandemic evolution. While useful such a visualization might be misleading, since the 
same prevalence obtained in different regions might correspond to very different values in the 
number of infected individuals if the two regions are very differently populated. Moreover, it is 
common to find strong population density heterogeneities, and it is not easy to detect visually 
a large level of contamination in a small but densely populated geographical area. In order to 
obtain a geographical representation which is able to carry at the same time information both 
on the level of infection and on the infection cases in each region, we have constructed the cor- 
responding cartograms of the original maps in which the size of each geographic region (in our 
case US influenza surveillance regions) is rescaled according to its population. Several meth- 
ods for constructing cartograms have been developed (see ll32l and references therein) and here 
we have adopted the diffusion-based method ll^ . which produces cartograms by equalizing 
the population density through a linear diffusion process. The geographical map representation 
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readily shows the heterogeneity of the spatio-temporal epidemic evolution but a quantitative 

characterization of this heterogeneity and its relation with the air transportation network statis- 
tical properties are major issues that are not yet fully explored. 

Epidemic heterogeneity and network structure 

In order to discriminate the role of the network structure on the spatio-temporal pattern 
of the epidemic process, we aim at a more quantitative analysis of its global heterogeneity. 
This heterogeneity might find its origin just in the stochastic nature of the infectious process 
or determined by the structural properties of the transportation network. In the latter case, it is 
possible to envision the possibility of a larger predictability of the epidemic behavior that would 
reflect the underlying network structure. Here we introduce for the first time a characterization 
of the epidemic pattern by using the entropy, a quantity customarily used in information theory 
to quantify the level of disorder of a signal or system. At each time step, a snapshot of the 
epidemic pattern is provided by the set of values of the prevalence ij{t) — Ij{t)/Nj in each 
city j. We can therefore define the normalized vector p with components pj — ij/ J2e '^e which 
contains the relevant information on the epidemic pattern. In particular, we can measure the 
level of heterogeneity of the disease prevalence by measuring the disorder encoded in the vector 
p with the normalized entropy function H 



If the epidemics is homogeneously affecting all nodes (i.e. all prevalences are equal) the entropy 
attains its maximum value H = 1. Starting from H = which corresponds to one initial 
infected city - the most localized and heterogeneous situation - H{t) increases as more cities 
become infected thus reducing the level of heterogeneity (see Fig. 3A). It is important to stress 
that in the present context the entropy does not have any thermodynamical meaning. It must be 
just considered as the appropriate mathematical tool able to quantify the statistical disorder of 
a complicate spatio-temporal signal. 

To ascertain the effect of the network structure we compare the results obtained on the actual 
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network with those obtained on different network models providing null hypotheses (see Fig.|51l. 
The first model network we consider (called HOMN) is a homogeneous Erdos-Renyi random 
graph with the same number of vertices V as the WAN, and is obtained as follows: for each pair 
of vertices (j, £), an edge is drawn independently, with uniform probability p = (k) /V where 
{k) is the average degree of the WAN. In this way, we obtain a typical instance of a random graph 
with a poissonian degree distribution, peaked around the average value {k) and decreasing faster 
than exponentially at large degree values, in strong contrast with the true degree distribution of 
the WAN. For the second model (called HETN) instead, we retain the exact topology of the 
real network. In both models, fluxes and populations are taken as uniform and equal to the 
corresponding averages in the actual air-transportation network. 

The differences in the behavior observed in the HOMN, the HETN and in the real case 
provide striking evidence for a direct relation between the network structure and the epidemic 
pattern. The homogeneous network displays a homogeneous evolution (with H ^ 1) of the 
epidemics during a long time window, with sharp changes at the beginning and at the end of the 
spread. We observe a different scenario for heterogeneous networks where H is significantly 
smaller than one most of the time, with long tails signalling a long lasting heterogeneity of the 
epidemic behavior. Indeed, the analytical inspection of the epidemic equations points out that 
the broad variability of the contact pattern (degree distribution) and the ratios Wji/Nj play an 
important role in the heterogeneity of the spreading pattern. Strikingly, the curves obtained 
for both the real network and the HETN are similar, pointing out that in the case of the air- 
port network the broad nature of the degree distribution determines to a large extent the overall 
properties of the epidemic pattern. Figure reports the average entropy profile together with 
the maximal dispersion obtained for the spreading starting from a given city with different re- 
alizations of the noise. It is clear that the noise has a mild effect and that the average behavior 
of the entropy is representative of the behavior obtained in each realization. In Fig.|4|we show 
the percentage of infected cities as a function of time for each null model and for the real case. 
While the HOMN displays a long time window in which all cities are infected, this interval is 
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much smaller in the HETN and completely absent in the WAN. 

Predictability and forecast reliability 

A further major question in the modeling of global epidemics consists in providing adequate 
information on the reliability of the obtained epidemic forecast; i.e. the epidemic predictability. 
Indeed the intrinsic stochasticity of the epidemic spreading will make each realization unique 
and reasonable forecast can be obtained only if all epidemics outbreak realizations starting with 
the same initial conditions and subject to different noise realizations are reasonably similar. A 
convenient quantity to monitor in this respect is the vector Tx{t) whose components are Tij{t) = 
Ijif) I i.e the normalized probability that an infected individual is in city j. The similarity 

between two outbreaks realizations is quantitatively measured by the statistical similarity of 
two realizations of the global epidemic characterized by the vectors vf^ and vf^^ respectively. 
As a measure of statistical similarity sim{7i^ , vf^^) we have considered the standard Hellinger 



difference in the total epidemic prevalence and we have to consider also si'm(i^ , i^^) where 
= 1 — and i{t) = Ij{t) /Af is the worldwide epidemic prevalence (A/" = 

J2j is the total population). We can thus define the overlap function measuring the similarity 
between two different outbreak realizations as 



The overlap is maximal (6(t) = 1) when the very same cities have the very same number of 
infectious individuals in both realizations, and 0(t) = if the two realizations do not have any 
common infected cities at time t. Clearly, a large overlap corresponds to a predictable evolution, 
providing a direct measure of the reliability of the epidemic forecast. In the HOMN we find a 
significant overlap (6 > 80%, see Fig. |5l) even at the early stage of the epidemics - the most 
relevant phase for epidemic surveillance. The picture is different if we consider the HETN and 
the real airport network where especially at the initial stage of the epidemics the predictability 
is much smaller. These results may be rationalized by relating the level of predictability to the 




Normalized similarity measures do not account for the 
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presence of a backbone of dominant spreading channels defining specific "epidemic pathways" 
which are weakly affected by the stochastic noise. Epidemic pathways are the outcome of the 
conflict between two different properties of the network. On the one hand, the heterogeneity of 
the connectivity pattern provides a multiplicity of equivalent channels for the travel of infected 
individuals depressing the predictability of the evolution. On the other hand, the heterogeneity 
of traffic flows introduces dominant connections which select preferential pathways increasing 
the epidemic predictability. The heterogeneous connectivity pattern of the HETN and the WAN 
thus generates a multiplicity of channels that decreases the predictability. In the real case the 
lowering of the epidemic predictability also indicates the dominant effect of the topological 
heterogeneity that wins over the opposite tendency of the traffic heterogeneity. The above 
framework is confirmed by the two distinct behaviors depending on the degree of the initial 
infected city. Epidemics starting in initial cities with a hub airport generate realizations whose 
overlap initially decreases to 50-60% because of the many possible equivalent paths resulting 
in a larger differentiation of the epidemic history in each stochastic realization. On the contrary, 
outbreaks from poorly connected initial cities display a large overlap due to the few available 
connections that favor the selection of specific epidemic pathways. 

Outlook 

From our study, it emerges that the air transportation network properties are responsible of 
the global pattern of emerging diseases. In this perspective, the complex features characterizing 
this network are the origin of the heterogeneous and seemingly erratic spreading on the global 
scale of diseases such as SARS. The analysis provided here show that large scale mathematical 
models that takes fully into account the complexity of the transportation matrix can be used to 
obtain detailed forecast of emergent disease outbreaks. We have also shown that it is possible to 
provide quantitative measurements of the predictability of epidemic patterns, providing a tool 
that might be used to obtain confidence intervals in epidemic forecast and in the risk analysis 
of containement scenarios. It is clear that to make the forecast more realistic, it is necessary to 
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introduce more details in the disease dynamics. In particular, seasonal effects and geographi- 
cal heterogeneity in the basic transmission rate (due to different hygienic conditions and health 
care systems in different countries) should be addressed. Finally, the interrelation of the air 
transportation network with other transportation systems such as railways and highways could 
be very useful for forecast on longer time scales. We believe however that the basic understand- 
ing of the interplay of the transportation network complex features with the disease spreading 
evolution and the detailed modeling obtained by the full consideration of these features may 
represent a valuable tool to test traveling restrictions and vaccination policies in the case of new 
pandemic events. 

We thank LATA for making the airline commercial flight database available to us. M.B. is on leave of 
absence from CEA, Departement de Physique Theorique et Appliquee BP12, 91680 Bruyeres-Le-Chatel, 
France. 
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Figure 1 : Properties of the world-wide airport network. Statistical fluctuations are observed over 
a broad range of length scales. (A) The degree distribution P{k) follows a power-law behavior 
on almost two decades with exponent 1.8 ± 0.2. (B) The distribution of the weights (fluxes) 
is skewed and heavy-tailed. (C) The distribution of populations is heavy-tailed distributed, in 
agreement with the commonly observed Zipf's law ll27ll . (D) The city population varies with 
the traffic of the corresponding airport as ~ T" with a 0.5, in contrast with the linear 
behavior postulated in previous works ll23l . 
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Figure 2: Geographical representation of the disease evolution in the US for an epidemics start- 
ing in Hong Kong based on an SIR dynamics within each city. States are collected according 
to the nine influenza surveillance regions. The color code corresponds to the prevalence in each 
region, from to the maximum value reached (pmax)- On the left the original US maps are 
shown, while on the right we provide the corresponding cartograms obtained by rescaling each 
region according to its population. Three representations of the airport network restricted to the 
US are also shown, in correspondance to three different snapshots. The nodes represent the 100 
airports in the US with highest traffic T; the color is assigned in accordance to the color code 
adopted for the maps. 

17 



Colizza et al. 




200 250 



time (days) 



time (days) 



300 



Figure 3: Analysis of the heterogeneity of the epidemic pattern in the actual network (WAN) 
compared with the two network models (HOMN) and (HETN). An SIR dynamics is adopted 
within each city. (A) Entropy H(t) averaged over distinct initial infected cities and over noise 
realizations. Each profile is divided into three different phases, the central one corresponding 
to H > 0.9, i.e. to a homogeneous geographical spread of the disease. This phase is much 
longer for the HOMN than for the real airport network. The behavior observed in HETN is 
close to the real case meaning that the connectivity pattern plays a leading role in the epidemic 
behavior. (B) Average value of the entropy, with the maximal dispersion obtained from 2-10^ 
noise realizations of an epidemics starting in Hong Kong. Fluctuations have a mild effect in all 
cases. 
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Figure 4: Percentage of infected cities as a function of time for an epidemics starting in Hong 
Kong based on an SIR dynamics within each city. The HOMN case displays a large interval in 
which all cities are infected. The HETN and the real case show a smoother profile with long 
tails, signature of a long lasting geographical heterogeneity of the epidemic diffusion. 
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Figure 5: Percentage of overlap as a function of time: the shaded area corresponds to the stan- 
dard deviation obtained with 5-10'^ couples of different realizations of the global spreading 
model based on an SIR dynamics within each city. Topological heterogeneity plays a dominant 
role in reducing the overlap in the early stage of the epidemics. We observe two different behav- 
iors depending on the degree of the initially infected city: a reduced initial predictability in the 
case of airport hubs (left) with respect to poorly connected cities (right). Large fluctuations at 
the end of the epidemics are observed in the HETN and in the real case, due to the different life- 
time of the epidemics in distinct realizations induced by the heterogeneity of the network. We 
also report the prevalence profile as a function of time showing that the maximum predictability 
corresponds to a prevalence peak. 
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